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One indicator of a university’s educational quality is the proportion of 
enrolled students who actually graduate within four years. This proportion is 
typically fewer than the number of students that enroll in a given year. A low 
graduation rate can have a negative impact on both the university’s reputation 
and its accreditation because it indicates that fewer students are completing 
their degrees. Student activity, economic, and other issues all play a role in 
why some students are unable to complete their degrees on time. As a result, 
stakeholders need a model that can predict whether or not students will 
graduate on time as a means of evaluating and giving a basis for policy actions. 
This research proposes a model for converting textual data into an image 
format using a deep learning convolutional neural network (CNN), and then 
classifying the extracted features using a variety of machine learning 


Graduation classification algorithms like the decision tree, random forest, Naive Bayes, 
Student support vector machine (SVM), and k-nearest neighbor (K-NN). The 
classification model trained on feature extraction data had a 96.1% accuracy 
rate, while the classification model trained on the original data achieved a 
71.2% accuracy rate. 
This is an open access article under the CC BY-SA license. 
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1. INTRODUCTION 

Since education plays an inextricable part in a country’s development, universities work hard to 
provide top-notch instruction that will benefit students in the long run [1]-[3]. The primary mission of 
universities is to produce highly skilled workers who can compete successfully in the labor market; graduates 
can then use these acquired abilities to secure employment and advance their careers [4]. As a result, many 
students run out of time or simply don’t manage to do their coursework at all [5]. Despite the fact that students’ 
academic achievements matter greatly, universities have other concerns as well. This is due to the fact that a 
high rate of student success in the classroom is a hallmark of a high-quality university. Increasing the number 
of students who do not graduate on time can lead to an increase in the amount of academic data from all students 
who are still enrolled, potentially threatening the reputation and accreditation value of the tertiary institution. 
Authorities in postsecondary institutions need to predict when students will graduate as a control and 
anticipation measure, and the process of doing so is also a step toward anticipating dropout problems, which 
are a serious issue in an education system because they cause financial, social, and economic losses. Students 
and the government are the primary actors in this scenario, thus we will focus on the political, academic, and 
economic dimensions of education [6]. In order to gain insight from data given as information, data mining 
(DM) has been shown to be useful [7]. The scope and variety of schooling data make it challenging to process 
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accurately. Data preparation is a crucial part of the educational data mining (EDM) process that can yield the 
best results and EDM can be used as a solution to this issue [8], [9]. Neural networks [10], Naive Bayes, 
decision trees, k-nearest neighbor (K-NN), support vector machine (SVM), and discriminant analysis are all 
examples of often-used classification methods in EDM research [11]-[15]. Predicting student success in higher 
education using machine learning models has become common practice. Attribute data collected through 
academic processes and saved in custom-built applications or databases are processed using a variety of 
machine learning (ML) algorithms [7], [16]. 

Multi-layered artificial neural networks like the deep learning convolutional neural network (CNN) 
model are frequently employed in image processing [17]. The CNN is a popular deep learning model due to its 
high performance in a variety of use cases. The field of education is one that has recently seen an uptick in 
interest in deep learning [17], [18]. While CNN has been widely hailed for its ability to analyze and recognize 
images, it has also been shown that the feature extraction process using CNN accurately represents the original 
data form in many other contexts, including educational data mining. Each neuron in a CNN receives multiple 
inputs, generates a product point value, applies an activation function, and finally, in the final (fully connected) 
layer, there is a loss function that measures the discrepancy between the predicted value and the expected 
output value [17], [19]. The hypothesis that the image data format allows for a better representation of the 
relationships between features that can be recognized by the CNN for analysis and prediction processes 
provides support for the process of transforming non-image data into the image data format [19]. In order to 
better express the relationship between features, such as categories or feature similarities, the sequence of 
features is sometimes reorganized in 2-D space during the transformation of tabular (non-image) data [20], 
[21]. Tabular data can be used in the feature extraction process with deep learning convolutional neural 
networks to predict graduation, and it has been shown to produce better results than conventional data mining, 
with the best accuracy of 77.35% using the random forest algorithm, and with deep learning CNN for the same 
data to achieve the achievement value of 87.44%. 

In this paper, we propose a model for converting tabular data into image format using a deep learning 
CNN, and we’ll classify the extracted features using a variety of machine learning algorithms like the decision 
tree, random forest, Naive Bayes, SVM, and K-NN. The data used is comprised of 4,041 grads from 4 different 
programs at the bachelor’s level from the Faculty of Computer Science at Universitas Dian Nuswantoro. 
The best values of recall, precision, and f-measures for classification will be determined by comparing the 
machine learning model’s output on the original data with that obtained using the CNN deep learning feature. 


2. METHOD 
2.1. Dataset 

There were a total of 4,041 records used in the research datasets, all of which related to graduates of 
undergraduate study programs at the Faculty of Computer Science, Universitas Dian Nuswantoro, with data 
collected from 2012-2018 for a total of 4 different programs. There are 2,293 records for study code 11,658 
for study code 12,666 for study code 14, and 424 for study code 15. A comparison graph of the number of 
labels included in the dataset is shown in Figure 1, which also has approximately the same amount of data. 
Label | has the equivalent of roughly 2,073 records (covering a research duration of less than 48 months and 
no more than 4 years). Label 2 contains 1,968 records (studied for more than 48 months; delayed by more than 
4 years). 
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Figure 1. The number of records in the dataset 
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Details for each of the descriptions of the attributes that were used can be found in Table 1, which 
contains information about the dataset attributes, which include age, marital status, place of birth, number of 
scholarships, number of applications for leave, number of student activities, number of achievements, GPA1, 
GPA2, GPA3 and GPA4. In addition, Table 1 contains information about the number of achievements that 
were earned by each student. 


Table 1. Attribute of dataset 


Attribute name 


Description 


Study program 
Hometown 


Age 


Marital 


Number of scholarships 
Number of leave applications 
Number of student activities 

Number of achievements 


4 undergraduate study programs (A11, Al2, Al4, and A15) 


1: for domicile in the city 
2: for domicile outside the city 
1: age>O and age <12 
2: age >=12 and age <26 
3: age >=26 and age <46 
4: age >=6 
1: Married 
2: Not married 
Number of scholarships received 
Number of times you have applied for leave 
Number of student activities participated in 
Number of achievements or certificates ever obtained 


GPAI1 Grade | for grade point average 
GPA2 Grade 2 for grade point average 
GPA3 Grade 3 for grade point average 
GPA4 Grade 4 for grade point average 


2.2. Research step 

Only numerical features can be processed by deep learning models [22]. One type of deep learning 
model used in the training and testing of data assigned a label is CNN [23], while the goal of the machine 
learning model is to achieve the best possible accuracy through experimentation [24]. In order to obtain the 
best accuracy value from the newly created features during training, this study will employ the deep learning 
model for feature extraction during the data training process, followed by the machine learning model for the 
classification process. Figure 2 explains the main steps in the research phase carried out. 
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Figure 2. Method steps 


In the proposed research methodology, out of 4,041 dataset records consisting of 12 attributes to be 
processed with the CNN architecture, the attribute grouping is carried out as follows, from N is the number of 
data records as many as 4,041, a is an attribute that is grouped into 3 categories as follows: 

— Student personal data attributes: (age, marital, city of origin, and scholarship). 

— Historical attributes: (number of leave requests, amount of arrears, number of student activities, and number 
of achievements/certificate awards). 

— Academic value attributes: (semester 1-semester 4 achievement index: IPS1, IPS2, IPS3, IPS4). 
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And W is the number of attributes from each group from A and K is worth 1| for the total category of 
graduation data from the Faculty of Computer Science at Universitas Dian Nuswantoro. So that the input data 
for the CNN deep learning stage is N*¥A*W*K = 4,041*3*4*1. Additionally, classification ML will be 
implemented using 5 classification techniques, including decision tree, random forest, Naive Bayes, SVM, and 
K-NN, to obtain the best evaluation value from the 100 new features data generated. Measures of recall, 
precision, accuracy, and f-measure are utilized in the evaluation process [25], [26]. 


3. RESULTS AND DISCUSSION 

Figure 3 depicts the fundamental components of a CNN, including the convolution 2D layer, the 2D 
pooling layer, the batch normalization layer, the fully connected layer, the non-linear activation function ReLU, 
and the max pooling 2D layer. In the following example, illustrated in Figure 4, we see the result of feature 
extraction data collected from the CNN stages for feature extraction on dense (dense feature) with output shape 
(none, 100). Table 2 displays the results of applying the machine learning classification method to the data 
resulting from feature extraction. 
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Figure 3. CNN stages 


en 92 93 94 95 96 97 98 99 
0.516714 ... 0.300420 -1.539349 2.059091 -0.181639 1.774889 -0.294648 -0.005839 -1.971039 
0.429307... 0.047781 -0.437921 1.042610 -0.294263 1.459267 -0.145321 0.181368 -0.677201 
0.352103... -0.113610 -0.476321 1.110847 -0.764649 1.287692 -0.077738 -0.524138 -0.677101 
0.681955 ... -0.399359 -0.635935 2.034534 0.153822 -0.351255 0.034884 0.330606 -1.114779 
0.325843... 0.277672 -0.399737 0.637576 -0.416581 1.033822 -0.914346 0.160373 -0.857471 
1.273204 ... 0.076275 -0.892636 0.293125 -0.671600 -0.016163 -0.912109 -0.090875 -1.692824 
0.062645 .. 0.797004 -0.639692 0.619118 -0.343200 1.459566 -1.052675 0.151528 -0.614814 
1.229106 ... ~-0.106534 ~-0.354537 0.545832 -0.037908 0.067480 -1.100398 0.453499 -1.278038 
0.272940 .. 0.087951 -0.705217 1.441743 0.469144 -0.243263 -0.574374 0.441524 -1.461822 
-0.741434 ... -0.014953 -0.223610 1.711095 0.290198 0.125478 -0.256920 0.205205 -1.292538 


Figure 4. Sample data for 100 new features 


Table 2. Classification results 

Model Accuracy Recall Precision Fl-score 

Naive Bayes CNN features 0.961253 0.961253 0.963212 0.961237 
Naive Bayes actual features 0.708986 0.708986 0.713551 0.708235 
SVM linear CNN features 0.956307 0.956307 0.957542 0.956299 
SVM linear actual features 0.710635 0.710635 0.715877 0.709730 
Decision tree CNN features 0.954658 0.954658 0.958495 0.954604 
Decision tree actual features 0.723825 0.723825 0.727725 0.721801 
KNN linear CNN features 0.938170 0.938170 0.938547 0.938174 
KNN linear actual features 0.706513 0.706513 ~=—-0.706571 ~—- 0.706534 
Random forest CNN features 0.938170 0.938170 0.938547 0.938174 
Random forest actual features 0.706513 0.706513 0.706571 0.706534 
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Table 2 shows that the data classification process based on the CNN feature extraction significantly 
improved over the classification process based on actual features in terms of accuracy, recall, precision, and 
fl-score. Figure 5 displays the accuracy performance rating. While the KNN algorithm achieves the highest 
accuracy for actual features 74.8%, the SVM algorithm achieves the highest accuracy for training data 95.1%. 
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Figure 5. Training and testing accuracy 


Based on Table 3, the Naive Bayes algorithm achieves the best data testing accuracy value for the 
CNN feature of 96.1%, compared to only using actual features which achieves an accuracy value of 70.9%. 
In addition, the SVM algorithm also achieves the best data testing accuracy value for the CNN feature 95.6% 
compared to only using actual features which is 71%. Then the decision tree algorithm also achieves an 
accuracy value of data testing for CNN features of 95.5%, compared to only using actual features which is 
72.4%. Likewise with the use of the KNN and random forest algorithms which have higher accuracy when 
using CNN features than those that only use actual features. So the use of the CNN feature plays a very large 
role in increasing accuracy compared to those that only use actual features. 


Table 3. Comparison of accuracy values 


No Model Training accuracy _ Testing accuracy 
1 Naive Bayes CNN features 0.944130 0.961253 
2 Naive Bayes actual features 0.685290 0.708986 
3 SVM CNN features 0.951556 0.956307 
4 SVM actual features 0.681400 0.710635 
5 Decision tree CNN features 0.940948 0.954658 
6 Decision tree actual features 0.695191 0.723825 
Hi KNN CNN features 0.945898 0.938170 
8 KNN actual features 0.748586 0.706513 
9 Random forest CNN features 0.945898 0.938170 
10 Random forest actual features 0.748586 0.706513 
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4. CONCLUSION 

In this paper, we explain the process of transforming data from its raw form into a data form that still 
accurately depicts the raw data. It has been demonstrated that implementing a deep learning CNN for the 
process of feature extraction with the CNN will boost the accuracy of the classification results achieved by all 
machine learning algorithms that have been implemented. In the future, the authors will undertake experiments 
for data complexity with a greater and more diversified number of characteristics and record power. They will 
also be able to compare the feature selection process for the attributes that will be employed, as well as the 
hyperparameters for the CNN architecture that was used. 
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